Often, we wish to link different data sets to form one single data set.
For example, we might have a data set with individual player information (name, age, height), a data set with a series of observations for each player (obs 1, obs 2, obs 3), and a further data set with information about each observation session.
21.2 Create dataframes
# Load dplyr for the full join functionalitylibrary(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
# Step 1: Create Data Frame 1df1 <-data.frame(ID =1:5,Name =c("Fred", "Bob", "Charlie", "David", "John"),Age =c(22, 21, 24, 19, 23))# Step 2: Create Data Frame 2# This data frame has a common field 'Name' with df1df2 <-data.frame(Name =c("Fred", "Bob", "Charlie", "David", "John", "Fred", "Bob", "Charlie", "David", "John", "Fred", "Bob", "Charlie", "David", "John"),Obs =c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3),Val =c(10, 20, 30, 40, 50, 15, 25, 35, 45, 55, 20, 30, 40, 50, 60))# Step 3: Create Data Frame 3# This data frame also has a common field 'Obs' with df2df3 <-data.frame(Obs =c(1, 2, 3, 4),Day =c("Monday", "Wednesday", "Thursday", "Friday"),Weather =c("Excellent", "Good", "Average", "Good"))
These look like:
# Display the data framesprint("Data Frame 1:")
[1] "Data Frame 1:"
print(df1)
ID Name Age
1 1 Fred 22
2 2 Bob 21
3 3 Charlie 24
4 4 David 19
5 5 John 23
print("Data Frame 2:")
[1] "Data Frame 2:"
print(df2)
Name Obs Val
1 Fred 1 10
2 Bob 1 20
3 Charlie 1 30
4 David 1 40
5 John 1 50
6 Fred 2 15
7 Bob 2 25
8 Charlie 2 35
9 David 2 45
10 John 2 55
11 Fred 3 20
12 Bob 3 30
13 Charlie 3 40
14 David 3 50
15 John 3 60
print("Data Frame 3:")
[1] "Data Frame 3:"
print(df3)
Obs Day Weather
1 1 Monday Excellent
2 2 Wednesday Good
3 3 Thursday Average
4 4 Friday Good
22 Merge dataframes
Now, we can merge the datasets using their common fields:
# Step 4: Merge Data Frame 1 and Data Frame 2 on 'ID'merged_df1_df2 <-merge(df1, df2, by ="Name", all =FALSE) # inner join# Step 5: Merge the result with Data Frame 3 on 'ID'final_df <-merge(merged_df1_df2, df3, by ="Obs", all =FALSE) # inner join# Display the merged data frameprint("Final Merged Data Frame (Inner Join):")
[1] "Final Merged Data Frame (Inner Join):"
print(final_df)
Obs Name ID Age Val Day Weather
1 1 Bob 2 21 20 Monday Excellent
2 1 David 4 19 40 Monday Excellent
3 1 Charlie 3 24 30 Monday Excellent
4 1 John 5 23 50 Monday Excellent
5 1 Fred 1 22 10 Monday Excellent
6 2 Bob 2 21 25 Wednesday Good
7 2 David 4 19 45 Wednesday Good
8 2 Charlie 3 24 35 Wednesday Good
9 2 John 5 23 55 Wednesday Good
10 2 Fred 1 22 15 Wednesday Good
11 3 Bob 2 21 30 Thursday Average
12 3 John 5 23 60 Thursday Average
13 3 Charlie 3 24 40 Thursday Average
14 3 David 4 19 50 Thursday Average
15 3 Fred 1 22 20 Thursday Average
Alternatively, if you want a full outer join to keep all rows from all data frames:
final_df_full <- df1 %>%full_join(df2, by ="Name") %>%full_join(df3, by ="Obs")# Display the full outer join resultprint("Final Merged Data Frame (Full Outer Join):")
[1] "Final Merged Data Frame (Full Outer Join):"
print(final_df_full)
ID Name Age Obs Val Day Weather
1 1 Fred 22 1 10 Monday Excellent
2 1 Fred 22 2 15 Wednesday Good
3 1 Fred 22 3 20 Thursday Average
4 2 Bob 21 1 20 Monday Excellent
5 2 Bob 21 2 25 Wednesday Good
6 2 Bob 21 3 30 Thursday Average
7 3 Charlie 24 1 30 Monday Excellent
8 3 Charlie 24 2 35 Wednesday Good
9 3 Charlie 24 3 40 Thursday Average
10 4 David 19 1 40 Monday Excellent
11 4 David 19 2 45 Wednesday Good
12 4 David 19 3 50 Thursday Average
13 5 John 23 1 50 Monday Excellent
14 5 John 23 2 55 Wednesday Good
15 5 John 23 3 60 Thursday Average
16 NA <NA> NA 4 NA Friday Good